Discontinuity Removal in Concatenative Synthesized Speech

نویسندگان

  • Sarpreet Kaur Gill
  • Parminder Singh
چکیده

Concatenative synthesis concatenates segments of prerecorded natural human speech. It requires database of previously recorded human speech covering all the possible segments to be synthesised. Segment might be phoneme, syllable, word, phrase, or any combination. Concatenative speech synthesis is currently the most practical method for the generation of realistic speech. There mainly two types of issues in concatenative synthesized speech. These are spectral discontinuity and joining cost of concatenative synthesized sound. When the join between two speech units is clearly audible, it refers to discontinuity. The synthesized speech can sound very natural if the discontinuities at the concatenation points are inaudible. But when these joins are audible, their presence can be very frustrating to the listener and it also reduces the overall perceived quality of synthesized speech. The join cost in unit selection is used to represent the compatibility of two consecutive units. This may also determined as the weighted sum of subcosts such as: Differences in F0 and amplitude. Mismatch in different spectral parameters, like MFCC (Mel Frequency Cepstral Coefficients), LPC (Linear Predictive Coding Coefficients). Detection of spectral mismatch at joining points is done by following some steps. The knowledge of WaveFile Format is required. Waveform Audio File Format (WAVE, or more commonly known as WAV due to its filename extension) is a Microsoft and IBM audio file format standard for storing an audio bitstream on PCs.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

مراحل و نحوه ی تهیه ی دادگان های صوتی هجایی و دایفونی برای سامانه ی تبدیل متن به گفتار فارسی

Abstract Speech databases are part of the concatenative text to speech synthesis systems. Phonetic quality of the databases plays a significant role in the naturalness of the synthesized speech. This paper introduces two syllable and diphone speech databases for Persian and investigates the way of their development and their specifications and their advantages to each other. ...

متن کامل

An Unit Selection based Hindi Text To Speech Synthesis System Using Syllable as a Basic Unit

Concatenative speech synthesis using phoneme, di-phone and allophone as an elementary unit for Hindi speech synthesis requires significant quality improvement. The naturalness of the state of the art waveform synthesizer is attributed due to the use of syllable as a basic unit. The primary reason for choosing the syllable as a basic unit is that the Indian languages are syllable centered. This ...

متن کامل

A Corpus-Based Concatenative Speech Synthesis System for Turkish

Speech synthesis is the process of converting written text into machine-generated synthetic speech. Concatenative speech synthesis systems form utterances by concatenating pre-recorded speech units. Corpus-based methods use a large inventory to select the units to be concatenated. In this paper, we design and develop an intelligible and natural sounding corpus-based concatenative speech synthes...

متن کامل

Improving speech synthesis of CHATR using a perceptual discontinuity function and constraints of prosodic modification

Concatenative synthesis is widely used in TTS to generate synthetic speech with high quality and relatively natural-sounding prosody. Whatever the type of synthesis unit used, (diphone, phoneme, etc.), a large speech database is usually needed to ensure the phonetic and phonemic variation of the units in a rich variety of contexts. In the CHATR synthesis system, unit selection nds the most appr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017